Pre-trained Language Models

نویسندگان

چکیده

Abstract This chapter presents the main architecture types of attention-based language models, which describe distribution tokens in texts: Autoencoders similar to BERT receive an input text and produce a contextual embedding for each token. Autoregressive models GPT subsequence as input. They token predict next In this way, all can successively be generated. Transformer Encoder-Decoders have task translate sequence another sequence, e.g. translation. First they generate by autoencoder. Then these embeddings are used autoregressive model, sequentially generates output tokens. These usually pre-trained on large general training set often fine-tuned specific task. Therefore, collectively called Pre-trained Language Models (PLM). When number parameters gets large, instructed prompts Foundation Models. further sections we described details optimization regularization methods training. Finally, analyze uncertainty model predictions how may explained.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ImageNet pre-trained models with batch normalization

Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most state-of-the-art approaches. In this paper, we present a new set of pretrained models with popular state-of-the-art architectures for the Caffe framework. The first release includes Residual Networks (ResNets) with generation script as well as the batch-normalization-variants of AlexNet and VGG19. All models ou...

متن کامل

Automatic Analysis of Flaws in Pre-Trained NLP Models

Most tools for natural language processing (NLP) today are based on machine learning and come with pre-trained models. In addition, third-parties provide pre-trained models for popular NLP tools. The predictive power and accuracy of these tools depends on the quality of these models. Downstream researchers often base their results on pre-trained models instead of training their own. Consequentl...

متن کامل

Unregistered Multiview Mammogram Analysis with Pre-trained Deep Learning Models

We show two important findings on the use of deep convolutional neural networks (CNN) in medical image analysis. First, we show that CNN models that are pre-trained using computer vision databases (e.g., Imagenet) are useful in medical image applications, despite the significant differences in image appearance. Second, we show that multiview classification is possible without the pre-registrati...

متن کامل

Pre-Computable Multi-Layer Neural Network Language Models

In the last several years, neural network models have significantly improved accuracy in a number of NLP tasks. However, one serious drawback that has impeded their adoption in production systems is the slow runtime speed of neural network models compared to alternate models, such as maximum entropy classifiers. In Devlin et al. (2014), the authors presented a simple technique for speeding up f...

متن کامل

Feed forward pre-training for recurrent neural network language models

The recurrent neural network language model (RNNLM) has been demonstrated to consistently reduce perplexities and automatic speech recognition (ASR) word error rates across a variety of domains. In this paper we propose a pre-training method for the RNNLM, by sharing the output weights of the feed forward neural network language model (NNLM) with the RNNLM. This is accomplished by first fine-tu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Artificial intelligence: Foundations, theory, and algorithms

سال: 2023

ISSN: ['2365-3051', '2365-306X']

DOI: https://doi.org/10.1007/978-3-031-23190-2_2